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Abstract 

Background: Goosegrass {Eleusine indica L), a serious annual weed in the world, has evolved resistance to several herbicides 
including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. 
To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome 
analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. 

Results: The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly 
analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length 
of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 
13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 
12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to 
reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially 
expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate 
genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. 

Conclusion: lh\s is the first large-scale transcriptome sequencing of £ indica using the lllumina platform. Potential genes 
involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a 
reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat 
resistance at the molecular level in goosegrass. 
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Introduction 

Eleusine indica L. (Gaertn), commonly known as goosegrass, is a 
monocot weed belonging to the Poaceae family [1] . Due to its high 
fecundity and a wide tolerance to various environmental factors, 
goosegrass is listed as one of the five most noxious weeds in the 
world and has been reported to be a problem weed for 46 different 
crop species in more than 60 countries [1]. Many herbicides are 
being used to control goosegrass, i.e., bipyridinium herbicides such 
as N, N'-dimethyl-4, 4 '-bipyridinium dichloride (paraquat); 
dinitroaniline herbicides; acetohydroxyacid synthase inhibitors 
such as imazapyr; and acetyl CoA carboxylase inhibitors such as 
fluazifop, glyphosate and glufosinate. However, application of the 
same herbicide for more than three consecutive years resulted in 
goosegrass populations that acquired resistance to the herbicide 
[2-7] . Paraquat, a quick-acting herbicide widely used for the non- 
selective control of weeds both in field crops and orchards, causes 
plant mortality by diverting electrons from photosystem I to 
molecular oxygen, resulting in a serious oxidative damage to the 
exposed tissues [8-9], Weeds can acquire resistance to paraquat 



from extensive exposure (over a period of >10 years) to the 
herbicide [10-12]. 

Current understanding of the molecular mechanism of paraquat 
resistance in higher plants includes sequestration of paraquat to 
the vacuoles and/ or enhanced activity of antioxidative enzymes 
[13-14]. Putrescine has been reported as a competitive inhibitor of 
energy-dependent, saturable transporters that facilitate paraquat 
transport across the plasma membrane [15-17], suggesting 
resistance to paraquat can likely be improved by modulating the 
activity of its transporters [9]. Further characterization of 
resistance mechanisms evolved in E. indica and other weeds to 
paraquat has been hindered due to the lack of genome-level 
information in these species. Next-generation sequencing (NGS) 
technology has rapidly advanced the analysis of genomes and 
transcriptomes in model plant and crop species which can now be 
applied to other species whose genomes have not been sequenced 
[18-19]. NGS has also been widely used for comparative 
transcriptome analysis to identify genes that are differentially 
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expressed across different cultivars or tissues or treatment 
conditions [20-23]. 

In this study, we explored the paraquat resistance mechanisms 
in resistant and susceptible biotypes of E. indica (Figure 1) by 
generating comprehensive de novo transcriptome datasets using 
Illumina platform. Analysis of the gene expression data identified 
unigenes that were assigned to various GO categories and KEGG 
metabolic pathways which can be used for further molecular 
characterization of paraquat resistance mechanisms in E. indica. 

Results 

Illumina Sequencing and de novo Assembly 

Four RNA-seq libraries sequenced from goosegrass seedlings 
were named based on their respective samples: SO - susceptible 
seedlings without paraquat; SQ - susceptible seedlings for mixed 
samples sprayed paraquat 40 min, 60 min and 80 min; R0 - 
resistant seedlings without paraquat; and RQ_ - resistant seedlings 
for mixed samples sprayed paraquat 40 min, 60 min and 80 min. 
SO, SQ, R0 and RQ libraries generated 57.25, 61.44, 66.51 and 
58.66 million raw reads, respectively (Table 1). More than 79.85% 
of all the raw reads used for de novo assembly had Phred-like quality 
scores at the Q20 level (an error probability of 1 %). We obtained 
158,461 (>200 bp) transcripts with an average length of 
1,153.74 bp and anN50 of 2,095 bp. 100,742 (>200 bp) unigenes 
with an average length of 712.79 bp and an N50 of 1,199 bp were 
obtained by using longest transcript in each loci as unigene 
(Table 2). The statistical results showed reducing trend of unigene 
number with increasing length of unigenes. Sequence length 
distribution of unigenes changed from 250 bp to 2000 bp 
(Figure 2). 



Functional annotation of assembled unigenes 

To study the sequence conservation of goosegrass genes with 
other plant species, we used an E-value threshold of 10 5 to 
annotate 35,016 (34.76%), 19,921 (19.77%), 35,983 (35.72%), 
17,574 (17.44%), 31,584 (31.35%) and 12,719 (12.63%) unigenes 
to nr [24], Swiss-Prot [25], TrEMBL [26], CDD [27], Pfam [28] 
and KOG [29] databases, respectively. The BLAST [30] results of 
sequences indicated that 35,016 unigenes had BLAST hits in 
nucleotide sequence database in NCBI database. The majority of 
the annotated nucleotide sequences of goosegrass corresponded to 
those of Poaceae plant species, which including Sorghum bicolor, £ea 
mays, Oryza sativa Japonica Group, Brachypodium distachyon and Oryza 
sativa Indica Group with matching ratios of 32.76%, 13.52%, 
12.35%, 5.45% and 5.30%, respectively (Figure 3). 

Gene ontology assignments were used to classify the functions of 
goosegrass transcripts. A total of 25,926 unigenes (25.74%) were 
assigned at least one GO term and classified into 65 functional 
categories using the complete set of GO terms for three main 
categories: biological process, cellular component and molecular 
function (Figure 4). The largest proportion was represented by 
metabolic process (GO: 0008152, 18.13%) and cellular process 
(GO: 0009987, 15.87%) under biological process; cell (GO: 
0005623, 11.06%) and ceU part (GO: 0044464, 11.06%) under 
cellular component; binding (GO: 0005488, 16.88%) and catalytic 
activity (GO: 0003824, 14.37%) under molecular function. 

In total, 12,719 unigenes were categorized into 25 KOG 
classifications (Figure 5). Among these categories, the cluster for 
"signal transduction mechanisms" (3,160, 24.84%) was the largest 
group, followed by the categories of "posttranslational modifica- 
tion, protein turnover, chaperones" (2,464, 19.37%), "general 




Figure 1. The growth of susceptible (A) and resistant (B) goosegrass biotype at various concentrations of paraquat. The dose of 

paraquat is kg-ai-ha -1 . 

doi:1 0.1 371 /journal.pone.0099940.g001 
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function prediction only" (2,062, 16.21%), "intracellular traffick- 
ing, secretion and vesicular transport" (1,352, 10.63%) and 
"translation, ribosomal structure and biogenesis" (1,195, 9.40%). 
The categories of "cell motility" (17, 0.13%) and "nuclear 
structure" (60, 0.47%) had the fewest corresponding genes. 

The 100,742 assembled sequences were mapped to the 
reference canonical pathways in KEGG. A total of 13,809 
unigenes with 1,208 enzyme commission (EC) numbers were 
assigned to 314 predicted KEGG metabolic pathways. The 
pathways most strongly represented by mapped unigenes were 
"ribosome" (ko 03010, 523 unigenes), "protein processing in 
endoplasmic reticulum" (ko 04141, 415 unigenes), "spliceosome" 
(ko 03040, 408 unigenes), "RNA transport" (ko 03013, 345 
unigenes) and "plant-pathogen interaction" (ko 04626, 326 
unigenes). 

Identification and annotation of differentially expressed 
genes (DEGs) 

Transcripts expression levels were calculated using RPKM 
(Reads per kilobase of exon model per million mapped reads). The 
expression differences of transcripts among the four samples of SO, 
SQ R0 and RQ are summarized in a venn diagram that clearly 
showed the overlapping relationship (Figure 6). Among all the 
transcripts (RPKM>10), 1,024 transcripts were expressed at all of 
the four samples, 388, 398, 454 and 412 transcripts were co- 
expressed in treatments of SO and SQ, R0 and RQ SO and R0, 
SQand RQ respectively. The numbers of each sample specifically 
expressed transcripts was 1,329 (SO), 1,389 (SQ), 1,378 (R0) and 
1,487 (RQ, respectively. 

In differentially expressed genes (DEG) analysis, we defined 
DEG as the fold change of the normalized (RPKM) expression 
values of at least 2 in both directions of log2 ratios 1 and false 
discovery rate (FDR)<0.001 (Figure 7). In total, 35,569 DEGs 
were up-regulated and 32,500 DEGs were down-regulated 
between the samples SO and SQ 30,5 1 8 DEGs were up-regulated 
and 35,020 DEGs were down-regulated between the samples of 
R0 and RQ 34,579 DEGs were up-regulated and 32,515 DEGs 
were down-regulated between the samples of R0 and SO; and 
33,722 DEGs were up-regulated and 22,902 DEGs were down- 
regulated between the samples of RQ and SQ. 

Comparison of transcripts involved in paraquat 
resistance 

ROS pathway. Many genes related to reactive oxygen species 
(ROS) removal pathway were differentially regulated in goosegrass 
biotypes after paraquat treatment which likely contributes to their 
susceptibility or resistance to paraquat. Of the 53 identified genes 

Table 1. Summary of goosegrass transcriptome sequencing. 



that function in the ROS scavenging pathway: 1 1 belonged to the 
glutathione-ascorbate cycle (glutaredoxin, GLR; monodehydroas- 
corbate reductase, MDAR; and glutathione reductase, GR); 34 to 
the glutathione peroxidase (glutathione, GST; and peroxidases, 
POD); 5 to the catalase (CAT) pathway; and 3 to the thioredoxin 
(Trx) pathway (Table 3). The three largest groups of ROS related 
genes were GST (24 genes), POD (10 genes) and GLR (7 genes). 
Highest transcript levels were observed for three genes in all the 
four samples, i.e., POD (comp 31277_c0_seq3), CAT (comp 
34816_c0_seql) and Trx (comp 34820_c0_seql). DEGs analysis 
revealed that most of ROS pathway genes are up-regulated both 
in resistant and susceptible biotypes of E. indica after application of 
paraquat (Table 3). DEGs in the ROS pathway that were down- 
regulated in treatment comparisons are as follows: SQvs SO - two 
GLR genes (comp 38421_c0_seql and comp 23286_c0_seql) and 
one POD (comp 14213_c0_seql); RQ vs R0 - two GLR genes 
(comp 38421_c0_seql and comp 23286_c0_seql), one GR 
(comp 41205_c0_seql), two GST (comp 38536_c0_seql and 
comp 8718_c0_seql), three POD (comp 41522_c0_seql, comp 
29849_c0_seq2 and comp 14213_c0_seql) and one CAT (comp 
34816_c0_seql). However, R0 vs SO comparison revealed: up- 
regulation (>2-fold) of one GST (comp 38536_c0_seql) and two 
POD (comp 41522_c0_seql and comp 1351 l_c0_seql); and 
down-regulation of one GLR (comp 23286_c0_seql) and 
five GST (comp 29100_c0_seq3, comp 27832_c0_seq2, comp 
16427_c0_seq2, comp 31694_c0_seql and comp 32752_c0_seq3). 
Whereas in RQ vs SQ comparison, one MDAR (comp 
29012_c0_seq4) and one GST (comp 17835_c0_seql) was up- 
regulated (> 1-fold), one GLR (comp 38421_c0_seql) and one 
POD (comp 29849_c0_seq2) were down-regulated (>l-fold) 
(Table 3). 

Polyamine metabolism. With reference to genes that are 
related to polyamine metabolism, 10 DEGs were found to encode 
enzymes that catalyze polyamine turnover (Table 4). In two 
comparisons of SQ vs SO and R0 vs SO, two genes (comp 
30623_c0_seq2 and comp 4323_c0_seql) were down-regulated 
whereas others were all up-regulated. Only two genes (comp 
30798_c0_seq2 and comp 10199_c0_seql) were up-regulated 
between RQand R0; comp 30798_c0_seq2 was up-regulated with 
1.07 fold change between RQand SQ while the expressions of 
other genes were not significantly altered (fold change ^1) 
(Table 4). 

Transporter related genes. Among the transcripts related 
to transmembrane transport, intracellular protein transport and 
ATP binding cassette transporters (ABC transporters), we identi- 
fied 9, 5 and 4 genes, respectively (Table 5). Between SQand SO, 
only two transmembrane transport genes (comp 28899_c0_seql 
and comp 28988_cl_seql) and one ABC transporters gene (comp 



Sample 


Raw Data 




Valid Data 






Valid Ratio (reads) 


Read 


Base 


Read 


Base 


Average length 




SO 


57,251,668 


5,725,166,800 


45,250,056 


4,130,568,660 


91.28 


79.04% 


SQ 


61,444,178 


6,144,417,800 


50,266,462 


4,621,243,200 


91.93 


81.81% 


RO 


66,507,532 


6,650,753,200 


52,712,154 


4,792,947,566 


90.93 


79.26% 


RQ 


58,663,262 


5,866,326,200 


46,487,888 


4,231,153,734 


91.02 


79.25% 


All 


243,866,640 


24,386,664,000 


194,716,560 


17,775,913,160 


91.29 


79.85% 



SO: susceptible goosegrass seedlings without paraquat; SQ: susceptible goosegrass seedlings for mixed samples sprayed paraquat 40 min, 60 min and 80 min; R0: 
resistant goosegrass seedlings without paraquat; RQ: resistant goosegrass seedlings for mixed samples sprayed paraquat 40 min, 60 min and 80 min. 
doi:1 0.1 371 /joumal.pone.0099940.t001 
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18030_c0_seql) were down-regulated, other 15 genes were up- 
regulated. In comparison of RQ, vs RO, 1 1 genes were up- 
regulated, while two ABC transporters genes of comp 
34549_c0_seql and comp 23747_c0_seq2 were up-regulated (> 
2-fold). 7 genes were down-regulated, while two genes of comp 
10856_c0_seql (transmembrane transport gene) and comp 
30571_c0_seql (ABC transporters gene) were down-regulated (> 
1.3 fold). Between RO and SO, 11 genes were up-regulated. The 
greatest up-regulated genes were comp 10856_c0_sel (transmem- 
brane transport gene) (>2.44 fold) and comp 37714_c0_seql 
(intracellular protein transport gene) (>2.98 fold). Two ABC 
transporters genes (comp 34549_c0_seql [—1.06 fold] and comp 
18030_c0_seql [—1.32 fold]) were maximally down-regulated 
among the 7 down-regulated genes. In RQ vs SQ_ comparison, 
only three genes, one transmembrane transport gene (comp 
37518_c0_seql [—0.05 fold]), two intracellular protein transport 
genes (comp 25903_c0_seql [—0.35 fold] and comp 
9251_c0_seql [—0.13 fold]) were down-regulated. 5 genes that 
showed highest expression among the 15 up-regulated genes 
included: one transmembrane transport gene (comp 9121_c0_seql 
[1.03 fold]); two intracellular protein transporters, (comp 
37714_c0_seql [1.26 fold]) and (comp 34970_c0_seql [1.01 
fold]); and two ABC transporters genes, (comp 34549_c0_seql 
[1.03 fold]) and (comp 23747_c0_seq2 [1.20 fold]). 

Discussion 

Construction of the transcriptome dataset for E. indica 

For many non-model species, there is very littie genome 
information available for researchers to conduct comprehensive 
investigations into the genetic mechanisms underlying their unique 
features and functions. The recent advances in next-generation 
sequencing technology has been used widely to explore genome 
and transcriptome information associated with important physi- 
ological phenomena in many plant species [21]. Our study has 
generated the first large-scale transcriptome data for goosegrass 
herbicide resistance using high-throughput Illumina sequencing. 
Comparison of the susceptible biotype with the paraquat-resistant 
biotype of goosegrass revealed gene expression regulation network 
that will be helpful to understand the molecular, biochemical and 
physiological processes underlying the paraquat resistance mech- 
anism in goosegrass. 

When paraquat is applied to plants, it causes rapid scorching of 
green tissue following exposure to light, typically within 30 min 
[31]. Therefore, the aerial parts of goosegrass seedlings from the 
two lines and treatments were used to construct RNA-seq libraries 
to perform comparative analysis of DEGs that will likely reveal the 
mechanism of paraquat resistance. To ensure that the mRNAs 
used for RNA-seq was the available but not-degradable RNA, we 
mixed the samples from the equivalent seedlings sprayed paraquat 
40 min, 60 min and 80 min [32]. 

Our analysis of RNA-seq data (194,716,560 sequence reads 
categorized into 158,461 assembled transcripts) identified 100,742 
unigenes, which is significandy larger than those previously 
reported for several transcriptomes analyzed for abiotic stress 
responses (e.g. 29,056 [23], 60,765 [20], 65,340 [21], 79,082 [22]). 
35,016 unigenes were annotated by nr database from 100,742 
unigenes. Although a high number of unigenes were not covered 
the complete protein-coding regions as revealed by BLAST 
alignment, the dataset we reported here still provided the largest 
dataset of different genes representing a substantial part of the 
transcriptome of goosegrass, which probably embraces the 
majority part of genes involved in the sophisticated regulation 
networks for resistant paraquat. The top five species with BLAST 
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Figure 2. Length distribution of unigenes characterized from RNA-seq libraries of goosegrass. 

doi:1 0.1 371 /joumal.pone.0099940.g002 



hits to annotated unigenes from goosegrass were Sorghum bicobr, 
Zea mays, Oryza saliva Japonica Group, Brachypodium distachyon and 
Oryza saliva Indica Group indicative of the conserved genes across 
monocot plant species. 

Genes involved in paraquat resistance 

It is well known that paraquat exerts its phytotoxic effect by 
catalyzing the transfer of electrons from photosystem I of 
chloroplast membranes to molecular oxygen, producing free 
radicals that cause lipid peroxidation and membrane damage 
[8] . Plants are known to possess a detoxification system, that allows 
removal of ROS, consisting of ascorbate and glutathione, as well 
as enzymatic components, e.g. superoxide dismutase, catalase, 
ascorbate peroxidase and glutathione reductase [33]. A proposed 
hypothesis in paraquat resistance is associated with the enhanced 
activity of antioxidative enzymes functioning in cooperation as a 
ROS scavenging cycle [34—35]. However, the enhanced activity of 
the enzymes in this cycle could not be detected in most of the 
paraquat-resistant plants according some earlier observations 
[11,14,36-37]. Compared with the transcriptome of untreated 
goosegrass, most of 53 highly transcribed genes related to ROS 
scavenging were up-regulated in both of susceptible and resistant 
biotypes after paraquat treatment. However, the transcripts had 
no significant differences between RO_ and SQ. Therefore, the 
antioxidant enzyme cycle only provides a temporary protection 
until other unknown mechanisms in paraquat-resistant plants 
ensure long-term survival [14]. 

Polyamines are low molecular weight aliphatic cations that are 
ubiquitous to all living organisms [38]. Several reports have 



described that paraquat treatment led to an increase in some 
polyamines and polyamine feeding also offered high levels of 
protection against paraquat toxicity [12,39-40]. Pretreatment of 
radish (Raphanus sativus L.) with polyamines (especially 1 mmol/L 
spermidine) significantly improved their tolerance to subsequent 
50 umol/L paraquat [41]. In the broadleaf weed Arctotheca 
calendula, some polyamines when applied concomitantly with 
paraquat can reduce the toxicity effects of paraquat. Two 
polyamines, spermidine and cadaverine, were effective in reducing 
paraquat translocation in susceptible A. calendula inducing these 
plants to perform more like resistant in terms of translocation [42] . 
This protective role of polyamines against paraquat stress has been 
also observed in many plants such as sunflower (Helianthus annuus 
L.) [39], rice [Oryza saliva cv. Taichung Native 1) [40], maize [Zea 
mays L. cv. 3377 Pioneer) [16], and some prokaryotes for example 
Escherichia coli [43-44] . In our goosegrass transcriptome, among 1 0 
highly transcribed genes related to polyamines, 8 genes were up- 
regulated after spraying with paraquat in susceptible goosegrass. 
Polyamines are involved in stress responses as growth regulator. 
After spraying paraquat, genes related to polyamines were higher 
compared to the untreated in the susceptible goosegrass. But the 
susceptible plant resiliency was limited and correspondingly most 
of genes related to polyamines were lower in sprayed paraquat 
susceptible plant compared to untreated resistant one. This is 
indicative of the resistant goosegrass having more polyamines to 
resist the toxic effects of paraquat. 8 genes were only slighdy 
downregulated in sprayed resistant plant. Lowered levels of five 
genes were in 0.5, though in gene comp3931_c0_seql fold change 
was 2.45, R0 (RPKM 12.92) were higher than SO (RPKM 2.20), 
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Figure 3. Percentage of conservation of goosegrass unigenes in different monocot species based on top BLAST hits. 
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SQ(RPKM 2.76), and RQ,(RPKM 2.36). Gene comp30798_c0_ 
seq2 were upregulated, and the gene expression level was higher, 
such as R0 (RPKM 59.88) and RQ (RPKM 156.83). Polyamines 
could protect rice leaves against paraquat toxicity, and paraquat 
treatment resulted in a higher putrescine and lower spermidine 
and spermine levels in rice leaves [40] . It suggested that paraquat 
showed different effects of different polyamines. Thus, our findings 



confirm that polyamines are involved in paraquat resistance in 
goosegrass, and the role of different polyamines in paraquat 
resistance should be further investigated. 

Previous reports proposed that some transporters, such as EmrE 
[45], PotE [46], PrqA, MvrA [47], CAT4 [48], AtPDR 1 1 [9] and 
RMV1 [49], are presumed to play a role in the resistance 
mechanism or to function by carrying paraquat to a metabolically 




Biological Process 



Cellular Component 



Molecular Function 



Figure 4. GO classifications of goosegrass unigenes. The results were summarized in three main categories: biological process, cellular 
component and molecular function by GO analysis. 
doi:1 0.1 371 /journal.pone.0099940.g004 
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F Nucleotide transport and metabolism 

G Carbohydrate transport and metabolism 

H Coenzyme transport and metabolism 

I Lipid transport and metabolism 
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K Transcription 

L Replication, recombination and repair 

M Cell wall/membrane/envelope biogenesis 

N Cell motility 

O Posttranslational modification, protein turnover, chaperones 

P Inorganic ion transport and metabolism 
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W Extracellular structures 
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Z Cytoskeleton 



Figure 5. KOG classification of putative proteins corresponding to goosegrass unigenes. All 12,719 putative proteins shown significant 
homology to those in KOG database were function classified into 25 molecular families. Right Y-axis indicates percentage of a specific category of 
genes in each main classification. Left Y-axis represents number of genes in a classification. 
doi:1 0.1 371 /journal.pone.0099940.g005 



SO SQ 




Figure 6. Venn diagram showing the genes expressed in each of four samples of goosegrass transcriptomes (RPKM>10). SO: 

susceptible goosegrass seedlings without paraquat; SQ: susceptible goosegrass seedlings for mixed samples sprayed paraquat 40 min, 60 min and 
80 min; R0: resistant goosegrass seedlings without paraquat; RQ: resistant goosegrass seedlings for mixed samples sprayed paraquat 40 min, 60 min 
and 80 min. 

doi:1 0.1 371 /journal.pone.0099940.g006 
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Figure 7. Scatter plot analysis of four sample pairs (SO vs SQ, RO vs RQ, RO vs SO and RQ vs SQ) from goosegrass. DEGs were 
determined using a threshold of log 2 Ratio >1 and FDR<0.001. SO: susceptible goosegrass seedlings without paraquat; SQ: susceptible goosegrass 
seedlings for mixed samples sprayed paraquat 40 min, 60 min and 80 min; RO: resistant goosegrass seedlings without paraquat; RQ: resistant 
goosegrass seedlings for mixed samples sprayed paraquat 40 min, 60 min and 80 min. Red spots represent up-regulated DEGs and green spots 
indicate down-regulated DEGs. Those shown in blue are Transcripts that did not show obvious changes. 
doi:1 0.1 371 /journal.pone.0099940.g007 



inactive compartment [50-51]. In this study, 18 genes corre- 
sponded to transmembrane transport, intracellular protein trans- 
port and ABC transporters. Most of these genes showed lower 
level of RPKM in susceptible goosegrass both in untreated and 
paraquat sprayed plants. 11 of 18 genes related to transport were 
up-regulated in the treatments between untreated resistant and 
susceptible goosegrass, while 15 of 18 genes were up-regulated in 
the treatments of compared resistant and susceptible goosegrass 
after spraying paraquat. This suggests that some transporters and 



the transport process they are involved in may play an important 
function in goosegrass resistance to paraquat. 

Conclusions 

The resistant and susceptible biotypes of E. indica, with or 
without paraquat, were used to generate the first large-scale 
transcriptome sequencing data using Illumina platform. The 
assembled sequences represented a considerable portion of the 
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transcriptome of this species. The sequence analysis generated 
194,716,560 valid reads with an average length of 91.29 bp. De 
novo assembly produced 1 58,46 1 transcripts with an average length 
of 1153.74 bp and 100,742 unigenes with an average length of 
712.79 bp. 25,926 unigenes were assigned to 65 GO terms. A total 
of 13,809 unigenes were assigned to 314 predicted KEGG 
metabolic pathways, and 12,719 unigenes were categorized into 
25 KOG classifications. The polyamine metabolism and transport 
related genes identified as DEGs provided a functional interpre- 
tation of paraquat resistance in goosegrass. Specific functions of 
these genes in acquired paraquat resistance can be further 
investigated using the transgenic approach. Collectively, our 
dataset will serve as a useful resource for further studies on the 
molecular mechanisms of paraquat resistance and accelerate the 
discovery of specific paraquat-resistance genes in E. indica. 

Materials and Methods 

Plant materials and experimental treatment 

A resistant (R) biotype of E. indica (goosegrass) was collected 
from the Teaching and Research Farm (113°40'E, 22°80'N) in 
Panyu District of Guangzhou, China, where papaya (Carica papaya 
L.) and banana (Musa nana Lour.) are cultivated and paraquat is 
used to control weeds continuously for ~20 years. The susceptible 
(S) biotype was collected from the campus of South China 
Agricultural University (113°36'E, 23°16'N). The paraquat 
resistant biotype was confirmed prior to performing experiments 
(Figure 1). Goosegrass seedling cultivation and paraquat treatment 
were performed as follows: seeds were scarified with sandpaper, 
sterilized for 10 min in 3% NaCIO, washed three times followed 
by 24 h imbibition in double distilled water, and then germinated 
in the plastic boxes (22x15.5x7 cm) which contained with a 2:1:1 
mixture of soil: peat: sand in a growth chamber at 34°C/28°C 
(day/night) with a 12 h photoperiod at a light intensity of 
800±200 MEm~ 2 -s _1 . 14 days after sowing (DAS), seedlings of 
both S/R biotypes were transplanted into 24 pots (9x7 cm), each 
containing 6 plants. 21 DAS, both S/R biotypes seedlings at the 
five leaf stage were sprayed with paraquat (Syngenta, China) of 
0.6 kg-ai-ha 1 (the recommended rate). The aboveground parts 
were taken from both untreated seedlings and treated seedlings 
sprayed with paraquat for 40 min, 60 min and 80 min, respec- 
tively. The collected samples were then immediately frozen in 
liquid nitrogen and stored at — 80°C for further experimentation. 

Following samples from four different treatments were collected 
for next-generation sequencing: (1) susceptible goosegrass seedlings 
without paraquat (SO); (2) susceptible goosegrass seedlings for 
mixed samples sprayed paraquat 40 min, 60 min and 80 min 
(SQ); (3) resistant goosegrass seedlings without paraquat (R0); and 
(4) resistant goosegrass seedlings for mixed samples sprayed 
paraquat 40 min, 60 min and 80 min (RQ). 

RNA isolation and cDNA library construction 

Total RNA was obtained from seedlings using the total RNA 
purification kit (LC Sciences, Houston, TX, USA) and was further 
purified using TruSeq RNA LT Sample Prep Kit v2 (Illumina, 
CA, USA) according to the manufacturer's protocol. Oligo-dT 
beads were used to yield poly (A+) mRNA from a total RNA pool 
consisting of equal quantities of total RNA from four sample types 
of SO, SQ, R0 and RQ. The purified mRNAs were fragmented by 
using divalent cations under elevated temperatures, and then 
converted to dsDNA by two rounds of cDNA synthesis using 
reverse transcriptase and DNA polymerase I. After an end repair 
process, DNA fragments were ligated with adaptor oligos [24]. 
The ligated products were amplified using 15 cycles of PCR to 
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generate an RNA-seq library. cDNA sequencing was performed 
using a Genome Analyzer IIx (Illumina). 

De novo assembly and annotation 

Raw data generated from Solexa were preprocessed to remove 
nonsense sequences including (1) adaptor contamination, (2) reads 
with unknown nucleotides comprising more than 5%, (3) low- 
quality reads with ambiguous sequence "N", and (4) very short 
(35 bp) sequences. Subsequendy, de novo assembly of the clean 
reads was performed using assembly program Trinity [52-53] 
which implements a de Bruijn graph algorithm and a stepwise 
strategy, with the default settings except for the K-mer value (25- 
mer). After assembly, the longest transcript in each loci 
(comp*_c*_) was named as "unigene" using Chrysalis Clusters 
module of Trinity software for subsequent annotation. 

For similarity searches, all assembled unigenes were compared 
with the proteins in the non-redundant (nr) protein database, 
Swiss-Prot, TrEMBL, CDD, Pfam and KOG databases, respec- 
tively, using BLAST with a significance threshold of E-value < 
10 \ Functional categorization by gene ontology (GO) terms was 
performed by the best BLASTX hits from the nr database using 
BLAST2GO software according to molecular function, biological 
process and cellular component ontologies with an E-value 
threshold of 10~ 5 . To further evaluate the integrity of the 
transcriptome library and the effectiveness of the annotation 
process, unigenes were subjected to Clusters of Orthologous 
Groups for Eukaryotic Complete Genomes (KOG) classification. 
The pathway assignments were carried out by sequence searches 
against the Kyoto Encyclopedia of Genes and Genomes (KEGG) 
database and using the BLASTX algorithm with an E-value 
threshold of 10" 3 . 



Differential gene expression profiling 

The expression abundance of each assembled transcript was 
measured through reads per kilobase per million mapped reads 
(RPKM) values. All read were mapped onto the non-redundant set 
of transcripts to quantify the abundance of assembled transcripts. 
Bowtie was used for read mapping and applied for RPKM based 
expression measurement. The expressions of each reads between 
sample pairs (SO vs SQ, R0 vs RQ, R0 vs SO and RQ,vs RQ) were 
calculated using the numbers of reads with a specific match. 
Among the four samples, a minimum of a two-fold difference in 
log 2 expression were considered as expression differences. 

Accession for RNA-seq data 

The RNA-seq data generated in the study have been uploaded 
into the NCBI-SRA database under the accession number 
SRR1 181642. 
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